home *** CD-ROM | disk | FTP | other *** search
-
- O B J A S M
- An object file to assembly language file conversion utility
-
- INTRODUCTION
- ------------
-
- Under normal compilation procedures, most programs are converted into an
- intermediate notation before being converted into executable code. This
- notation simplifies the process of managing large programs by allowing the
- programmer to divide a program into many modules. When changes are made
- to a module, the programmer need only compile that module rather than
- the entire program. For most 80x86 and compatible processors, this notation
- is stored in a .OBJ file.
-
- After compilation, the .OBJ file must be converted into a machine
- executable form. At this stage the various modules must be combined and
- any inter-module dependencies must be resolved. This process is called
- linking and is accomplished by LINK, LINK86, or PLINK86 (depending on which
- vendor you purchase it from). In order to accomplish this resolving, the
- inter-module dependancy information must be stored in the .OBJ file. This
- information normally consists of routine names, variable names, and other
- globally known quantities.
-
- For the most part, once a program has been compiled into its .OBJ file,
- there is no going back. There is no way to convert the .OBJ file back into
- its appropriate language. It is for that reason that many companies who wish
- to sell thier proprietary routines to programmers, do so in the .OBJ file
- form. Normally these companies send out a library file (.LIB). .LIB files
- contain one or more .OBJ files. The program which groups .OBJ files into an
- .LIB file is known as LIB, LIB86, or PLIB86. This method of distribution
- has made routine vendors happy for many years.
-
- But this distribution of encoded routines has meant headaches for
- programmers. Programmers who discovered bugs in the routines, or wanted to
- make enhancements had to contact the vendor and plead for help. The
- programmers were at the mercy of the routine vendors.
-
- Fortunately, OBJASM came along. OBJASM is a utility which can convert
- .OBJ files back into assembly language, much like a dis-assembler. Unlike
- a dis-assembler however, OBJASM is able to determine the names of the routines
- that it is dis-assembling. Globally known data names can be determined as
- well. OBJASM is also able to determine the segments of the original routines.
- Segments are logical/physical divisions of sections of routines, normal
- segments are code, data, stack, etc. With this additional knowledge, OBJASM
- is able to produce an assembly language listing which is easier to read, easier
- to understand, and easier to modify than normal dis-assembler output.
-
- OBJASM output is compatible with MASM version 4.0 or later. It runs
- under MS-DOS version 3.0 or later although it is able to process .OBJ files
- (as specified by INTEL, document order no. 121748-001) from many other 80x86
- operating systems such as CPM-86, XENIX 80x86, OS/2 and more. I believe that
- the format for UNIX (it has .o files) is completely different.
-
- COMPILING OBJASM
- ----------------
-
- OBJASM was written for Microsoft C version 4.0 or later, and there is
- a supplied 'make' file for that environment. To execute the 'make' file,
- set-up your Microsoft C environment and go to the directory which contains
- the OBJASM source files. Make sure that you have the Microsoft C utility
- programs reachable using your 'PATH' environment variable, and type 'MAKE O'
- from the dos prompt. The compilation will proceed to compile OBJASM for
- the large model.
-
- One of the modules (ouinsert.c) will give a compiler warning message, or
- may fail depending how picky you have set your compiler. The message has
- something to do with 'casting far pointer to long' or vice-versa. Make sure
- your compiler gives only a warning message and proceeds with the compilation
- process. The warning message can be ignored. Data structure fanatics: the
- program uses threaded balanced binary trees for many internal lists and the
- ouinsert module is the insert-into-a-list function. The root node of each tree
- has the tree's height (an integer) stored into one of the structures
- pointer type members. This is non-portable C code, but it's ok for most C
- compilers. These threaded balanced binary tree routines were taken from a
- set of generic list handling functions.
-
- The modules contained in OBJASM are as follows:
-
- omain.c main() processes command line and passes through input file
- oprocess.c dis-assembly/dumping controlling routines
- odisasm.c routines to dis-assembly the various kinds of 80x86
- instructions
- oreport.c routines to display some of the internal lists (publics and
- externals)
- ooutput.c routines to display various pieces of the dis-assembled .OBJ
- routine
- orXXXXXX.c (all except oreport.c) routines to process the various INTEL
- standard .OBJ record formats
- oubuff.c buffer I/O routines for the input file
- ouinsert.c list insertion routine
- oufind.c list searching routine
- ounewtre.c list creation routine
- ouinitre.c initialize all lists
- oufmterr.c .OBJ format error handling routine
- ouget.c input file formating (into chars,ints,longs,etc) routines
- oumalloc.c internal memory allocation routine (with check for out of
- memory error)
- oustruct.c assembly language structure routines
- oextra.c additional information file processing routines
-
- In addition, some of the modules (omain for example) have some debugging
- statements conditionally compiled. If OBJASM exhibits strange behavior and
- you wish to trace it down, re-compile these modules with the /DDEBUG option
- (for MSC). This will define the preprocessor variable DEBUG, which will
- enable the conditionally compiled debugging statements.
-
-
- EXECUTING OBJASM
- ----------------
-
- OBJASM is executed from the command line as:
-
- OBJASM -options filename
- OBJASM -options filename.OBJ
-
- If you are dis-assembling a .LIB file, then you must use your LIB, LIB86, or
- PLIB86 utility to extract the .OBJ files.
-
- All output is sent to standard output (normally the console), but can be re-
- directed using the '>' symbol. A normal OBJASM execution would look like this:
-
- OBJASM OMAIN >OMAIN.ASM
-
- The output would then be stored in the file omain.asm. Output contains tab
- (control-I) characters to seperate the various assembly language columns.
- A normal line format would appear as:
-
- [LABEL]: (tab) [INSTRUCTION] (tab) [OPERANDS] (tab) ;[COMMENT]
-
- start: mov ds,ax ; dummy comment
-
- The options currently available for OBJASM are as follows:
-
- Option Description
- ------ ---------------------------------------------------------------
- -4 Make MASM v4.0 compatible output (No retf)
- -a Add labels for better output (use with EXEOBJ)
- -h Add hex comments showing bytes dis-assembled
- -r Make RASM86 compatible output (special segment directives, etc)
- -c Set the minimum size in a code segment, this option is followed
- by a number which represents the new value. Eg. -c20
- -s Set the minimum size in a data segment, this option is followed
- by a number which represents the new value. Eg. -s10
- -f Specifies an additional information file. The additional information
- file must be given in parenthesis. Eg. -f(myfile) The additional
- information file's default extension is ".add", as in the example
- above, the file specified would be "myfile.add". For more
- information see the ADDITIONAL INFORMATION FILE section.
-
-
- Technical Information
- ---------------------
-
- Most information used in writing OBJASM was taken from the following
- public documents:
-
- INTEL 8086 RELOCATABLE OBJECT MODULE FORMATS (Order No. 121748-001)
- MICROSOFT OMF Specification (Dated February 18,1986; this is an edited
- copy of the INTEL document)
- MS-DOS ENCYCLOPEDIA
-
- These documents describe the .OBJ file format. Here is a quick
- summarization of the format. Each .OBJ file contains one or more records.
- Each record has a record type (which specifies its purpose), a record length,
- and record data (which depends on the record type). INTEL outlines 30
- different record types, most of which are never used. MICROSOFT added to
- these record types and removed the ones which it didn't need. The record
- types which OBJASM supports are listed below.
-
- Record Type Type Name Description
- ----------- ----------- -----------------------------------
- 80h THEADR Module Header Record
- 88h COMENT Comment Record
- 8Ah MODEND End of Module Record
- 8Ch EXTDEF External Dependancy Definition
- 90h PUBDEF Public Value Definition
- 96h LNAMES List of Internally Referenced Names
- 98h GRPDEF Group Definition
- 9Ah SEGDEF Segment Definition
- 9Ch FIXUPP Dependancy Fixing-up Record
- A0h LEDATA (Logically) Enumerated Data Record
- A2h LIDATA (Logically) Iterated Data Record
- B0h COMDEF Communal Value Definition
- B4h LEXTDEF Local External Definitions (C static routines)
- B6h LPUBDEF Local Public Definitions (??)
- B7h LPUBDF2 Local Public Definitions (?? Another case? )
- B8h LCOMDEF Local Communal Value Definitions (??)
-
- The C modules which handle the records of the above formats are named by
- placing the characters "OR" before the type name and have a .C extension.
- For example, the module named "ORSEGDEF.C" contains the C routine to handle
- the segment definition records.
-
- INTEL defines these other record types which are not supported by OBJASM.
-
- Record Type Type Name Description
- ----------- ----------- -----------------------------------
- 6Eh RHEADR R-Module Header Record
- 70h REGINT Register Initialization Record
- 72h REDATA Relocatable Enumerated Data Record
- 74h RIDATA Relocatable Iterated Data Record
- 76h OVLDEF Overlay Definition Record
- 78h ENDREC End Record
- 7Ah BLKDEF Block Definition Record
- 7Ch BLKEND Block End Record
- 7Eh DEBSYM Debug Symbols Record
- 82h LHEADR L-Module Header Record
- 84h PEDATA Physically Enumerated Data Record
- 86h PIDATA Physically Iterated Data Record
- 8Eh TYPDEF Type Definition Record
- 92h LOCSYM Line Numbers Record
- 94h LINNUM Line Number Record
- A4h LIBHED Library Header Record
- A6h LIBNAM Library Module Names Record
- A8h LIBLOC Library Module Locations Record
- AAh LIBDIC Library Value Dictionary
-
- MICROSOFT documents an obsolete method for generating communal records
- using the TYPDEF record. This obsolete method is not handled by OBJASM.
-
-
- The dis-assembler portion of OBJASM is a two pass process. The first
- determines where local labels and symbols need to be placed, and the second
- outputs the dis-assembled instructions (with the labels and symbols).
-
- To determine whether to dis-assemble instructions or data, OBJASM performs
- a look ahead operation on each byte of the data records. If the byte is a
- printable character, then successive bytes are checked as well. The main
- module has a two variables called 'code_string' and 'data_string' which are
- used to specify the minimum length of a string. The 'code_string' value
- is used in code segments while the 'data_string' value is used in data
- segments. If this minimum length of printable characters is exceeded, then
- the bytes will be output as a string. OBJASM is shipped with a value of 20
- for 'code_string' and 3 for 'data_string'. They work well for most .OBJ files.
-
- Bytes which are not contained in strings are checked against an instruction
- lookup and processing routine. If it is determined that this byte will
- generate a valid instruction, then it is output as an instruction. All other
- bytes are output as simple data bytes.
-
- Some assemblers and compilers will generate .OBJ records which contain
- a portion of a string or instruction in one record and the remainder in the
- next .OBJ record. OBJASM does not handle this very well. Strings spanning
- more than one record will be divided. Instructions spanning more than one
- record will not be recognized. Output would look this this:
-
- String:
- db 'This is all o' it should be db 'This is all one string'
- db 'ne string'
-
- Instruction:
- db 0BBh it should be mov bx,MYSYMBOL
-
- dw MYSYMBOL
-
- OBJASM attempts to accomodates for this by keeping a 16 byte overlap
- area. This overlap area is checked before processing the end of a record.
- This helps alleviate the problem for objects which are smaller that 16 bytes,
- something which is true for all normal instructions.
-
- For some other interesting quirks in OBJASM, please read the sections
- titled "Differences in .OBJ files which cannot be detected" and "Features
- allowable in .OBJ format which are not translatable into MASMable code".
-
- ADDITIONAL INFORMATION FILE
- ---------------------------
- The additional information file, specified with the -f option, is another
- method of specifying information to OBJASM. Lines in this file take one of
- the following formats:
-
- Format: Example:
- SEG segname segtype SEG DRIVER CODE
-
- This specifies that the segment named "segname" should be considered
- either data or code. The segtype should be either CODE or DATA indicating
- which type is desired. Normally OBJASM determines the segment using the
- segment's name, but specifying the segment name and segment type in this
- way allows overiding this determination.
-
- Format: Example:
- var = segname : offset TableY = DRIVER : 128
- TableX = DRIVER : 200h
-
- This specifies that a label should be placed within the segment "segname"
- at the offset "offset". This can be used to add labels to the dis-assembled
- code, and can also be used to overide the internal labels generated by OBJASM.
-
- Format: Example:
- segname : offset : datatype DRIVER : 200h : DW
- DRIVER : 202h : DD
-
- This directs OBJASM to avoid the instruction/string look-ahead process
- and output data of a specified type. OBJASM outputs a data directive within
- the specified segment named "segname", at the offset "offset". The "datatype"
- can be any of these values: DB, DW, DD, DF, DQ, or DT. These correspond to
- the MASM base data types.
-
- Although this last format allows direct control of how OBJASM outputs
- data, care must be taken to put OBJASM into a state where it can actually
- use this information. If OBJASM is processing data as an instuction, it
- cannot be directed to output a data directive in the middle of that
- instruction. For example:
-
- Additional information: .OBJ FILE:
- SOME_SEGMENT: 1 : DW 8B360200 MOV SI,[0002]
-
- this would direct OBJASM to output a word in the middle of the MOV instruction.
- In-order to accomplish this, the addtional information file would have to
- direct that the first byte of the MOV instruction was also a data byte.
- To get it to work, you would have to have this:
-
- Additional information: .OBJ FILE:
- SOME_SEGMENT: 0 : DB 8B db 08Bh
- SOME_SEGMENT: 1 : DW 3602 dw 0236h
- 00 ...
-
- Other
- -----
-
- Yes, there always is a section which defies categorization...
- The OBJASM program being continually refined. If you have any comments about
- its execution, wish to add features to it, or need to report a bug(s), please
- contact:
-
- Robert F. Day
- 19906 Filbert Drive
- Bothell, WA 98012
- (206) 481-8431
-
- The following section is a list of bugs that could not be fixed before
- shipping OBJASM. Please read the bug and problems lists to familiarize
- yourself with the known bugs and problem situations. The design notes are
- included to help you help me in refining OBJASM. If you need (or want) to
- add features to OBJASM, please contact me. We may be working on a similar
- addition and we may be able to save you some time.
-
- -------------------
- Bugs to Dec 14 1990
- -------------------
-
- 1. TYPDEF records are ingored (I had some Digital Research Libraries which
- used TYPDEF records, but Microsoft doesn't use them).
-
- 2. Some languages generate segment names which are the same as the names
- of labels within the segment. MASM won't allow this. Please rename
- the segment, if possible.
-
- --------------------------------------------------
- Differences in .OBJ files which cannot be detected
- --------------------------------------------------
-
- (1) if label2 = label1 + 010h
-
- MOV AX,label1 + 0010h ; Might have been like this
-
- and
-
- MOV AX,label2 ; OBJASM will generate this
- ; (Evaluates to equivalent address)
-
- Reason: Public labels are resolved in local code before being sent to
- the linker. Although the .OBJ specifications allow two places
- to store symbol addition information (above, the 0010h), MASM
- only uses one of them. This is a probable source of some other
- .OBJ differences. MASM automatically computes the offset of
- the public symbol and creates a fixup record indicating that
- the offset of the segment added to the offset of the public
- within the segment should be used (rather than just a fixup
- indicating that the offset of the public should be used).
-
- Handling: When resolving a reference to an address which is not equal to a
- public symbol, a new local symbol is created.
-
-
- -------------------------------------------------------------------------------
- Features allowable in .OBJ format which are not translatable into MASMable code
- -------------------------------------------------------------------------------
-
- (1) A piece of code like:
-
- DW _labeln - $ ; Relative (data form of local jmp/call)
-
- Reason: MASM will not compile the above line. It is an equivalent data
- representations for relative 'JMP's and 'CALL's
- where _labeln is the label to jump or call to.
-
- Handling: If OBJASM detects this type of code (almost all .OBJ files have
- it) then the data output function will substitute an
- actual value for the $ operator. This will be noted by a nasty
- comment. The dis-assembly output function will still work if
- they are truly 'JMP's or 'CALL's (Relative addressing is
- proceeded by the jump short or call short opcodes).
-
- -----------------
- End of OBJASM.DOC
-